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ABSTRACT 

The aim of this paper is to present the recently proposed 
fluid diffusion based algorithm in the general context of 
the matrix inversion problem associated to the Gauss-Seidel 
method. We explain the simple intuitions that are behind 
this diffusion method and how it can outperform existing 
methods. Then we present some theoretical problems that 
are associated to this representation as open research prob- 
lems. We also illustrate some connected problems such as 
the graph transformation and the PageRank problem. 

Categories and Subject Descriptors 

G.1.3 [Mathematics of Computing]: Numerical Anal- 
ysis — Numerical Linear Algebra; G.2.2 [Discrete Mathe- 
matics]: Graph Theory — Graph algorithms 

General Terms 
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1. INTRODUCTION 

In this paper, we revisit the very well known linear algebra 
equation problem: 

A.X = B 

where A is a square matrix of size N x N and B a vec- 
tor of size A'' with unknown X. There are many known 
approaches to solve such an equation: Gaussian elimina- 
tion, Jacobi iteration, Gauss-Seidel iteration, SOR (succes- 
sive over-relaxation), Richardson, Krylov, Gradient method, 
etc cf. [illlS]). 

In this paper, we propose a new iteration algorithm based 
on the decomposition of matrix-vector product as a fluid 
diffusion model. This algorithm has been initially proposed 
in the context of PageRank problem [5]. The fluid diffusion 
idea was first introduced in pQ. 

1.1 Different iterative methods 

The solution of A.X = B can be solved in particular with 
iterative methods such as Jacobi or Gauss-Seidel iterations, 
when A satisfies certain conditions (e.g. strictly diagonally 
dominant or symmetric and positive definite). 



We recall the Jacobi iteration defined by the formula: 
xr^' = ^L-^a,.fA,i = l,2,..,Ar. 

V / 

and the Gauss-Seidel iteration: 

4'^'' = ;^ ^ - E ' - E -^M'^'^] , * = 1. 2, .., A. 

an \ I 

\ j>i ]<i / 

Both iterations are element-wise formula. The main dif- 
ference is that in Jacobi iteration the computation of A*-''"'"^^ 
uses only the elements of A^*' (similar to power iteration 
method in eigenvector problem) whereas in Gauss-Seidel it- 
eration the computation of a::^ exploits the elements of 
^(fc+i) ^jjg^^ have already been computed for j < i. And 
this is the main reason which explains why the Gauss-Seidel 
method is generally more efficient than Jacobi iteration (when 
the convergence condition is satisfied). However, unlike the 
Jacobi method, the Gauss-Seidel method is not adapted for 
a distributed computation, because the values at each itera- 
tion are dependent on the order of the choice on i and there 
is not much freedom. 

The approach proposed here, which we will call D-iteration 
(as Diffusion based iteration), is a new improvement idea 
that exploits the progressive update at the vector entry level 
as Gauss-Seidel with the advantage of being iteration order 
independent which makes it a very interesting candidate for 
an asynchronous distributed computation. 

1.2 Collection vs Diffusion methods 

For the sake of simplicity and intuitive explanation, we 
associate the Gauss-Seidel method to an operation of collec- 
tion (one entry of vector is updated based on the previous 
vector based on the incoming links). 




Collection Diffusion 



Figure 1: Intuition: collection vs diffusion. 

Our approach consists in an operation of diffusion (the 



fluid difi'usion from one entry of the vector consists in up- 
dating all children nodes following the outgoing links) (cf. 
Figure [1} . When the iteration is based on vector level up- 
date (such as Jacobi iteration or Power iteration), the col- 
lection or diffusion approaches become equivalent (full cycle 
operations on all entry of the vector). 
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Figure 2: Intuition: matrix product decomposition. 

Somehow, those two types of operations can be seen as 
dual operations, but with different consequences. With the 
diffusion approach, we have a very powerful result on the 
convergence (monotone with an explicit information on the 
distance to the limit, cf. [5]) and more importantly on the 
independence in the order of vector entries on which the 
diffusions are applied. 
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Figure 3: Intuitive comparison. 

From the intuitive point of view, D-iteration can outper- 
form the Gauss-Seidel approach by the appropriate opti- 
mized choice of the sequence of the vector coordinates for 
the diffusion and by its distributive computation. 

2. D-ALGORITHM 
2.1 Notations 

We reuse here the notation introduced in [S]: P is a square 
matrix of size N x N such that each column sums up to less 
than one (stochastic or sub-stochastic matrix). 

The D-algorithm has been initially defined in the PageR- 
ank eigenvector context associated to the iteration of an 
equation of the form: 



A.Xr, 



(1) 



where A is a matrix of size N x N which can be explicitly 
decomposed as: 



where V is a normalized vector of size N (in the context of 
PageRank, the vector V is a. personalized initial condition 
cf. [7]) and 1 is the column vector with all components equal 
to one. So we have: 



X„+l = dPXr, + (1 - d)V. 



(3) 



The idea of the fluid diffusion is associated to the compu- 
tation of the power series: 

oo 

= (1 - d) ^/'pV = (1 - d){Ia ~ dpy'v = X 

k = 

where Id is the identity matrix and which deflnes the limit 
of the equation 

We deflne Jk a matrix with all entries equal to zero except 
for the fc-th diagonal term: {Jk)kk ~ 1- 

In the following, we assume given a deterministic or ran- 
dom sequence I = {ii, 12, in, ■■■} with i„ G {1, .., A'^}. We 
only require that the number of occurrence of each value 
k £ {1, .., TV} in / to be infinity. 

We recall the deflnition of the two vectors used in D- 
iteration: the fluid vector F associated to / by: 



Po = {l-d)V 

P„ = dP.h„Fn-x+Y^ JkFn- 

kjtin 

= {Id - ■K+dPJ,JF^-r. 
And the history vector H by: 

n 

Hn ~ y ] Jii.Fk-1- 



(4) 

(5) 

(6) 
(7) 



A = dP+{l-d)V.l' 



(2) 



It is shown in that II„ satisfies the iterative equation: 

H„ = {Id - J,^{Id - dP)) Hn-i + J,„{1 - d)V 

and that H„ converges to the limit X whatever the choice of 
the sequence I. The Li norm of F„ gives the exact distance 
to the limit, if the matrix P has no zero column vector 
(otherwise, it defines an upper bound of the distance). 

2.2 Pseudo-code 

We recall here the pseudo-code presented in [5]: 

Initialization: 
H[i] := 0; 
F[i] := F_0[i] ; 
r := IFI; 

Iteration: 
k := 1; 

While ( r/(l-d) > Target_Error ) 
Choose i_k; 
sent : = F [i_k] ; 
H[i_k] += sent; 
F[i_k] := 0; 
If ( i_k has no child ) 

r -= F[i_k] ; 
else 

For all child node j of i_k: 

F[j] += sent*p(j ,i_k)*d; 
r -= F[i_k]*(l-d) ; 
k++: 



Ho is initialized to and Fo to (1 — d)V when associated 
to the equation |3] (the constant vector). In the case of the 
hnear equation A.X = B, Fq is initiahzed to B or cB (cf. 
Section [2^411. 



2.3 Convergence condition 

Here we only discuss the sufficient convergence condition 
based on the diagonally dominant matrix. 

2.3.1 Diagonal dominant 

We recall that A is strictly diagonally dominant (by columns) 

if: 

\aii\ > ^ |aji|, for all i. 

For the Gauss-Seidel iteration, it is more natural to con- 
sider the row version of the strictly diagonally dominant 
condition, whereas for the diffusion point of view the col- 
umn version is natural. However, as for the Gauss-Seidel 
convergence condition, both conditions guarantee the con- 
vergence of the D-iteration. 

2.3.2 Fluid diffusion reduction 

We say that A satisfies the fluid diffusion reduction con- 
dition if: 



El 



< 1, for all i. 



A strictly sub-stochastic matrix (for all columns) is a spe- 
cific case satisfying such a condition. This can be seen as a 
specific case of contractive matrix. 

Here, we could also define the row version for the diffusion 
reduction. 

Finally, as for the convergence condition of the Gauss- 
Seidel iteration, the D-iteration converges if for at least one 
column, we have the fiuid diffusion reduction and for the 
other we have the equality X^jLi ~ 1 when A is irre- 
ducible. 

2.4 Connection to the linear equation A.x = B 

We can rewrite the equation A.X — B as: 

X = {Id- A).X + B. 

Then, if X is a normalized probability vector X^^^ ooi = 1, 
we can replace B by B.l^.X to get: 

X = P.X 

with P = {Id — A) + B.l*. We can recognize here the specific 
case of the PageRank equation [8] for which B = (1 — d)V 
and Id-A^dP ([S1I2]). 

Equivalently, from an affine iteration equation X„+i = 
PXn + B, we can associate to its limit X a linear equation 
A.X = B with A = Id- P. 

Now we can rewrite A.X — B as cA.X — cB for any c > 
and 

X = {Id-cA).X + cB. 

We define P{c) = Id — cA. Without any loss of generality, 
we can assume that all diagonal terms of A are positive 
(otherwise we can multiply the corresponding line vector of 
A and the corresponding B's entry by -1). 



Theorem 1. A is strictly diagonally dominant (per col- 
umn), if and only if for all c < — ^ | ^ | (and c > 0), 

P{c) satisfies the fluid diffusion reduction condition. 

Proof. When A is strictly diagonally dominant (per col- 
umn), then for all c < 



El(-PW)«,| = l-c|a.,|+;^c|a,,| < 1. 

The last inequality uses exactly the strictly diagonally dom- 
inant property. Therefore this defines a necessary and suffi- 
cient condition. 

2.5 Connection to the general eigenvector prob- 
lem: X = P.X 

Here we assume that we want to solve the eigenvector 
equation X = P.X and that we have a square matrix 7? 
such that R.X = V . Then we have: 



X 



(P-R).X + V. 



If the spectral radius of P — _R is strictly less than 1, we can 
apply the D-iteration to compute X. 

If P is a transition matrix and if we are looking for a 
probability vector X — P.X, we have in particular R = 
{a/N)J where J is a matrix with all entries equal to 1 and 
a > (in case of PageRank equation, we use R = {{1 — 
d)/N)J). 

We illustrate this through a simple example: take the 
transition matrix P = (^'^ ^'i'-' want to find the 
stationary probability X = P.X. 



0.5 




=> 



Node 1 



Node 2 




Figure 4: Example: solving X = P.X 



Then we have (for q = 1): P ~ R 



^0 0.5 
^0 -0.5y 

Then X is can be obtained by applying the D-iteration on 
(P — R, [0.5, 0.5]) or by the link elimination process defined 
in the following section to obtain X = [2/3, 1/3]. 

One simple sufficient condition on a stochastic matrix P 
on which the D-iteration is convergent is given by the fol- 
lowing theorem. 

Theorem 2. Let P be a stochastic matrix (non-negative 
matrix such that each column sums up to 1 ). Define N'^{i, a) - 
\{j ■ Pji ^ ck/-^}| 0,^ ^-he number of i-column entry of P 
larger than (or equal to) a/N. If there exists a > 0, such 
that for all i, N'^{i,a) > N/2, then the D-iteration on 
(P - {a/N)J, (l/iV)l) IS convergent. 

Proof. Under the above condition, we show that P — 
(a/N) J satisfies the fluid diffusion reduction condition. We 



have: 



^ Ip,, - a/N\ = 

i 

ij>a/N j <a/N- 



Figure [6] illustrates a simple example: the suppression of 
the link pii =0.75 implies a multiplication by 1/(1 — 0.75) = 
4 of (Fo)i and P2i, so that we have in this case: 

X([[0.75, 0.5], [0.1, 0]]; [1, 1]) = X{[[0, 0.5], [0.4, 0]]; [4, 1]) 



And we have "^^iPij ~ li therefore "^^iiPij ~ ct/N) = 1 — a 
and 



/N 



/JV- 



Hence 



Y^\p,j-a/N\ = 2x^|p.j-Q/iV|lp^^<„/jv + l-a 

i i 

< 2 X ^ a/iVlp., «./iv + 1 - Q 

i 

< 2x{N -N+{i,a))a/N + l-a 

< 1. 

Remark 1. If P is irreducible, it is sufficient to have 
N'^{i,a) > N/2 and for at least one column, a strict in- 
equality. 

Remark 2. Obviously, the practical condition of the above 
theorem is to have N'^ (i) = \{j : pji > 0}| > N/2 for all i. 

3. LINK ELIMINATION 

The fluid diffusion model can be in the general case de- 
scribed by the matrix P associated with a weighted graph 
{pij is the weight of the edge from j to i) and the initial con- 
dition Fo- So if there is a unique limit X from this setting, 
we can set X = X{P,Fo), which means that X is the limit 
of the D-iteration applied on (P, Fo). 

3.1 Diagonal link elimination 

Thanks to the freedom on the sequence /, we have the 
following result: 

Theorem 3. We have the equality (suppression of all di- 
agonal term pu such that pu ^ 1; anyway if pa > 1, the 
D-iteration diverges): X{P,Fo) = X{P',Fq) where 

• = (Fo), X ^ and {P')u = 0; 

• and if i / j, {P')ij = pij . 



<=> 



Figure 5: Diagonal elimination. 



Proof. The result is straightforward noticing that the 
self-diffusion pu is to be applied to {Fo)i and to all fluids 
coming from incoming links. Such an operation is equivalent 
to the product by j^^. 




1 <=> 



Figure 6: Diagonal elimination: example. 

3.2 Non-diagonal link elimination 

We can extend the above operation in a general case when 
any link pij is suppressed (but after the diagonal suppres- 
sion) . 

Theorem 4. We assume that pi^i = 0. Then, we have 
the equality (elimination of the link pjni from i' to j' , j' 7^ 
i'): X{P, Fo) = X{P', Fi) where 

•forjT^j', {F(,)j^{Fo)j; 

• {F(>)j' = {Fo)r+PjH' x(Fo),,; 

• and {P')ji = Pji except for all i an origin node of an 
incoming link to i' , {P')jii = Pjn + PjH' x Pi'i. 

Proof. The result is straightforward noticing that the 
elimination of the link from i' to j' affects only the fluid 
that goes to j' with the D-iteration: therefore we need to 
push to j' the initial fluid (fo)^/ and add a new link (or 
modify the weight of the existing one) from all origin nodes 
of an incoming link to i' which would replace the fluid going 
from i to j' though i' . 



<=> 




Fij') + p(i'.i')'F(r 



Figure 7: Link pjiii elimination. 

Figure [8] illustrates a simple example: the suppression of 
the link P21 ~ 0.5 implies an addition of a link P22 = 0-1- 
0.5 * 0.4 = 0.2 and the addition of 0.5 x 4 = 2 on (^0)2 to 
get the equality: 

X{[[0, 0.5], [0.4, 0]]; [4, 1]) = X([[0, 0], [0.4, 0.2]]; [4, 3]) 



If we continue on suppressing the diagonal link P22, we 
get: 

X{[[0, 0], [0.4, 0.2]]; [4, 3]) = X([[0, 0], [0.4, 0]]; [4, 3.75]) 



Therefore, we get: X = [4 -f 0.4 * 3.75, 3.75] = (6, 3.75). 
The above link suppression operation can be compared to 
the direct Gauss elimination method to solve A.X = B (in 



0.5*0.4 = 0.2 




and if i 7^ j: 



1 <=> 



0.4 

Figure 8: Link p2i elimination: example. 

0.2 



<=> 



Figure 9: Elimination of p22- 



both cases, we get the exact limit in a finite number of oper- 
ations) . The possibility of applying such a links elimination 
method in the computation cost reduction of the eigenvector 
problem or to solve the linear equation A.X — B may be 
an interesting question. As the above illustration example 
shows, we can apply up to the point the solution X becomes 
explicit (no more links). However the cost of the link elimi- 
nation is proportional to the number of incoming links and 
in the context of a very large matrix, such transformation 
may not be cost effective, because we may produce a very 
connected graph in the middle of the above transformation. 

Also, it would be interesting to investigate further the idea 
of applying such a transformation for the purpose of nodes 
clustering problem. 

Below, we show one application case of such a transfor- 
mation. 

3.3 Application of the graph transformation 
for the convergence condition 

From P{c), we apply the diagonal suppression method. 
The result is Q such that: 



and if i 7^ j: 



q^J 



1 



= 



The D-iteration for P{c) is convergent if and only if the 
D-iteration is convergent for Q. As for the Gauss-Seidel 
iteration, the D-iteration converges when the spectral radius 
of Q is strictly less than 1. 

Remark 3. The matrix Q may be obtained directly from 
A.X = B, dividing each line i by an. 

3.4 Another graph transformation 

From A.X — B, we can apply a more natural transfor- 
mation for the fiuid diffusion: because of the column based 
diffusion reduction condition, we set: x'i = Xi x an and we 
divide by an the i-th column vector of A to get A' . Then 
we get Q' = Id — A' such that: 

qu = 



The advantage of this approach is that this formulation 
is simpler to be taken into account for the sequence I opti- 
mization: for instance, for the greedy one step vision opti- 
mization (cf. Section 

3.5 Example 

To illustrate the different approaches and for a simple 
comparison, we introduce the following case: 



A 



/5 


-2 



1 



1 



3/ 



and B 



1 
1 

VJ 



The results are shown on Figure [TCTl we compared Jacobi, 
Power iteration (with c = 1/8), Gauss-Seidel, D-iteration us- 
ing Q with cyclical sequence / = 1,2,3,4,1,2... (D-iter/Q: 
CYC) and D-iteration with greedy approach taking the node 
with the maximum fluid in absolute value (D-iter/Q: Creedy). 
For the D-iteration, we assumed here that A'^ diffusions are 
equivalent to one matrix- vector product iteration. We would 
get the same result considering a finer iteration cost count 
based on the number of link Oij utilization. 
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Figure 10: Convergence comparison. 

To illustrate further the impact of the choice of the se- 
quence I, we added in Figure [TT1 the result obtained when 
applying the D-iteration on Q' when taking the node that 
maximizes step by step the Li-norm of Fn+i, or equivalently 
the fiuid reduction in one step (D-iter/Q': Greedy). 

This example is only for illustration: in this particular 
case, the gain brought by the D-iteration is of the order 
of the gain brought by the Gauss-Seidel compared to the 
Jacobi iteration. 

A much larger gain with D-iteration is expected with large 
sparse matrix and [5] gives an illustration of this in the 
context of PageRank on the web graph. And more impor- 
tantly, we can efficiently and naturally distribute the pro- 
posed method. 

4. OPEN PROBLEMS 
4.1 Optimization problem 
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Figure 11: Convergence comparison. 

As we showed, when the fluid reduction condition is sat- 
isfled, D-iteration converges whatever the choice of the se- 
quence / (it is only required that the difi'usion is apphed 
an infinity of times on each vector entry) . The convergence 
speed is dependent on the way the sequence / is applied. 
When we apply a cyclical order, the D-iteration's perfor- 
mance should be intuitively close to the performance of the 
Gauss-Seidel iteration. Then the natural question is: is 
there an optimal sequence for an optimal convergence speed 
to the limit or equivalently an optimal sequence to acceler- 
ate the fluid reduction to zero ? An empirical solution has 
been proposed in [S] in the context of PageRank associated 
matrix by: 

In = are max ^- 

^ . \{#im + l)xi#outi + l)J 

In a general case when the D-iteration is applied on a ma- 
trix Q' , we could define a cost proportional to the num- 
ber of outgoing links with a gain equal to the fluid reduc- 
tion factor, so we could replace {Fn-\)i/ {ij^outi -\- 1) by 
|(K_i),| X (1 - \q'j^\) / {#ouU + 1). The intuition to re- 
place the term l/(#mi -I- 1) is less obvious. So a possible 
empirical adaptation could be: 

In = argmax \{Fn-i)i\ x — — , , , . 

Note that we can not apply this intuitive expression to Q, 
because on one step vision, we may have a fluid increase. 
Also, if A is not strictly diagonally dominant, we can not 
use the above expression. For the time being, in such a 
case, we can possibly apply again: 

f \{Fu-i)i\ \ 
In = arg max '-^ 

The author believes that while being a very complex theo- 
retical problem a better solution can be built here. We hope 
to address those questions in a future paper. 

4.2 Application of the graph transformation 

As it has been remarked above, it may be interesting to in- 
vestigate further the idea of applying graph transformations 
described in Section [3] for the purpose of nodes clustering 



problem or for a faster convergence when mixed with the 
iteration methods. 

Remark 4. It is interesting to notice that X{P, Fq—P.Fq) = 
Fo- This is obvious from the algebraic power series formu- 
lation of X , but not that obvious from the point of view of 
the diffusion. 

5. CONCLUSION 

In this paper, we presented the D-iteration method, ini- 
tially introduced in the PageRank eigenvector problem, to 
solve efficiently the linear equation A.X = B. 

We believe that we have here a promising new intuitive 
representation and approach that can be applied in a very 
large scope of linear problems. 
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